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Abstract 

This article deals with the computation of the characteristic polynomial of dense matrices 
over small finite fields and over the integers. We first present two algorithms for the finite 
fields: one is based on Krylov iterates and Gaussian elimination. We compare it to an 
improvement of the second algorithm of Keller- Gehrig. Then we show that a generalization 
of Keller-Gehrig's third algorithm could improve both complexity and computational time. 
We use these results as a basis for the computation of the characteristic polynomial of integer 
matrices. We first use early termination and Chinese remaindering for dense matrices. Then 
a probabilistic approach, based on integer minimal polynomial and Hensel factorization, is 
particularly well suited to sparse and/or structured matrices. 

1 Introduction 

Computing the characteristic polynomial of an integer matrix is a classical mathematical problem. 
It is closely related to the computation of the Frobenius normal form which can be used to 
test two matrices for similarity. Although the Frobenius normal form contains more information 
on the matrix than the characteristic polynomial, most algorithms to compute it are based on 
computations of characteristic polynomial (see for example [23, §9.7] ). 

Using classic matrix multiplication, the algebraic time complexity of the computation of the 
characteristic polynomial is nowadays optimal. Indeed, many algorithms have a 0{n^) algebraic 
time complexity ( to our knowledge the older one is due to Danilevski, described in [13, §24]). 
The fact that the computation of the determinant is proven to be as hard as matrix multiplication 
[2] ensures this optimality. But with fast matrix arithmetic {0{n'^) with 2 < a; < 3), the best 
asymptotic time complexity is 0{n'^\ogn), given by Keller-Gehrig's branching algorithm [18]. Now 
the third algorithm of Keller-Gehrig has a 0{n'^) algebraic time complexity but only works for 
generic matrices. 

In this article we focus on the practicability of such algorithms applied on matrices over a 
finite field. Therefore we used the techniques developped in [5, 6], for efficient basic linear algebra 
operations over a finite field. We propose a new 0{n^) algorithm designed to take benefit of 
the block matrix operations; improve Keller-Gehrig's branching algorithm and compare these two 
algorithms. Then we focus on Keller-Gehrig's third algorithm and prove that its generalization is 
not only of theoretical interest but is also promising in practice. 

As an application, we show that these results directly lead to an efficient computation of the 
characteristic polynomial of integer matrices using Chinese remaindering and an early termination 
criterion adaptated from [7]. This basic application outperforms the best existing softwares on 
many cases. Now better algorithms exist for the integer case, and can be more efficients with 
sparse or structured matrices. Therefore, we also propose a probabilistic algorithm using a black- 
box computation of the minimal polynomial and our finite field algorithm. This can be viewed as 
a simplified version of the algorithm described in [24] and [17, §7.2]. Its efficiency in practice is 
also very promising. 
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2 Krylov's approach 



Among the different tecfiniques to compute the characteristic polynomial over a field, many of 
them rely on the Krylov approach. A description of them can be found in [13]. They are based on 
the following fact: the minimal linear dependance relation between the Krylov iterates of a vector 
V (i.e. the sequence {A^v)i gives the minimal polynomial ^^^^ sequence, and a divisor 

of the minimal polynomial of A. Moreover, if X is the matrix formed by the first independent 
column vectors of this sequence, we have the relation 

AX XCpm^r. 

where Cpmin is the companion matrix associated to P™™. 



2.1 Minimal polynomial 

We give here a new algorithm to compute the minimal polynomial of the sequence of the Krylov's 
iterates of a vector v and a matrix A. This is the monic polynomial P™™ of least degree such 
that P{A).v = 0. We firstly presented it in [22, 21] and it was simultaneously published in [20, 
Algorithm 2.14]. 

The idea is to compute the n x n matrix Ka,v (we call it Krylov's matrix), whose ith column 
is the vector A^u, and to perform an elimination on it. More precisely, one computes the LSP 
factorization of „ (see [14] for a description of the LSP factorization). Let k be the degree 
of PaT- This means that the first k columns of Ka,v are linearly independent, and the n — k 
following ones are linearly dependent with the first k ones. Therefore S is triangular with its last 
n — k rows equals to 0. Thus, the LSP factorization of iiT^ ^ can be viewed as in figure 1. 




. P 



Figure 1: principle of the computation of P™™ 

Now the trick is to notice that the vector m = L^+iL^^ gives the opposites of the coefficients 
of PX^". Indeed, let us define X = K\^^ 

fc-i 

Xi...n,k+i = {A'^v f = ^ m^iA^v f = m • Xi,..nA...k 

2 = 

where P^rC^) = - mkX''-^ niiX - mo. 

Thus 

Lk+iSP = m ■ Li„,kSP 

And finally m — Lk+i-L^^ 

The algorithm is then straightforward: 

The dominant operation in this algorithm is the computation of K, in log2n matrix multipli- 
cations, i.e. in 0{n'^logn) algebraic operations. The LSP factorization requires 0{n'^) operations 
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Algorithm 2.1 MinPoly : Minimal Polynomial of A and v 
Require: A a. n x n matrix and v a vector over a field 

Ensure: ^'^in (^) the minimal polynomial of the sequence of vectors {A'^v)i 

1: Ki,,,n,l = V 

2: for j = 1 to log2(n) do 

3- -^l...ra,2i...2»+i-l = -f^l...n,1...2>-l 

4: end for 

5: (L,S',P) = LSP(i^* ), A; = rank(i^) 

6; TO = Lk+l.L^^k 

7: return P^^^iX) = X'^ + Eto ^i^' 



and the triangular system resolution, ©(n^). The algebraic time complexity of this algorithm is 
thus 0{n'^logn). 

When using classical matrix multiplications (assuming a; = 3), it is preferable to compute the 
Krylov matrix ii" by A; successive matrix vector products. The number of field operations is then 

It is also possible to merge the creation of the Krylov matrix and its LSP factorization so as to 
avoid the computation of the last n — k Krylov iterates with an early termination approach. This 
rcduc;es the time complexity to O{n'^log{k)) for fast matrix arithmetic, and 0{'n?k) for classic 
matrix arithmetic. 

Note that choosing v randomly makes the algorithm Monte-Carlo for the computation of the 
minimal polynomial of A. 



2.2 LU-Krylov algorithm 

We present here an algorithm, using the previous computation of the minimal polynomial of the 
sequence {A^v)i to compute the characteristic polynomial of A. The previous algorithm produces 
the k first independent Krylov iterates of v. They can be viewed as a basis of an invariant subspacc 
under the action of A, and if P^™ = P^™, this subspace is the first invariant subspace of A. 
The idea is to make use of the elimination performed on this basis to compute a basis of its 
supplementary subspace. Then a recursive call on this second basis will decompose this subspace 
into a series of invariant subspaces generated by one vector. 

The algorithm is the following, where k, P, and S come from the notation of algorithm 2.1. 



Algorithm 2.2 LUK : LU-Krylov algorithm 



Require: A a, n x n matrix over a field 
Ensure: P^i^j.{X) the characteristic polynomial of A 



char 

Pick a random vector v 

-Pmin (^) = MinPoly {A, v) of degree k 



{X 
if {k 



Li 
n) 



[Si\S2\P is computed} 



then 



return P,t, = P^^" 



else 

A' = PA^P^ = 

7:>^22^^21'5l ■52 

char 



A' 
^11 

A' 

^21 



A 

A' 



12 



22 



where A'^ \s k x k. 



{X) = LUK{A'2 



return P,1,(X) 



P^C:iX) X P-- 



9: end if 



A21S1 ^82) 

^21'S'l ^■52 
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Theorem 2.1. The algorithm LU-Krylov computes the characteristic polynomial of an n x n 

matrix A in 0{n^) field operations. 



Proof. Let us use the following notations 

X = 



L2 



[Sl\S2]P 



As wc already mentioned, the first k rows of X (^i...fe,i...n) form a basis of the invariant 
subspace generated by v. Moreover wc have 



CpA,vXl,,k 



Indeed 
and 



Xi..kA'^ 

'iiKk XiA^ = {A^-^vY A^ = {A'vY = X^+i 



fe-i 

XkA^ = [A^'-^vf A^ = [A'^vf = Y,^i i^'^f 

The idea is now to complete this basis into a basis of the whole space. Viewed as a matrix, 
this basis form the nx n invertible matrix X. It is defined as follows: 



X 



' Li ■ 




Si S2 


P = 


In-k 




In-k 





Xi,,,k,l...n 
[ In-k ] P 



Let us compute 



XA^X~ 



[ /„_fe ] pa^p^s-'l-' 



[ ^21 ^22 








Y 


X2 



with 



X2 = A'22 — A21S1 ^82 



By a similarity transformation, we thus have reduced A to a block triangular matrix. Then 
the characteristic polynomial of A is the product of the characteristic polynomial of these two 
diagonal blocks: 



char 



min char 



Now for the time complexity, we will denote by Tuxin) the number of field operations for this 
algorithm applied on a n x n matrix, by Tminpoiy(n, k) the cost of the algorithm 2.1 applied on a 
nxn matrix having a degree k minimal polynomial, by Tlsp {m,n) the cost of the LSP factorization 
of a m X n matrix, by Ttrsm("^) n) the cost of the simultaneous resolution of m triangular systems 
of dimension n, and by Tyiif{m, k,n) the cost of the multiplication of a m x fc matrix by a fc x n 
matrix. 
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The values of Tlsp and Ttrsm can be found in [6]. Then, using classical matrix arithmetic, we 
have: 

TLm{n) = 7iiinpoiy(n, fc) +TLsp(A:,n) +Ttrsm('^ - fc, fc) 

+T^(n - k,k,n- k) + TlukC" - 
= 0{n^k + k^n + k'^{n-k) + k{n-kf) 

+Tum{n-k) 
= OiJ2n^ki+k^n) 

i 

= 0{n') 

The latter being true since ki = n and J2i < 

□ 

Note that when using fast matrix arithmetic, it is no longer possible to sum the log{ki) into 
log{n) or the k"~'^n? into n'^, so this prevents us from getting the best known time complexity 
of n^log{n) with this algorithm. We will now focus on the second algorithm of Keller-Gehrig 
achieving this best known time complexity. 

2.3 Improving Keller-Gehrig's branching algorithm 

In [18], Keller-Gehrig presents a so called branching algorithm, computing the characteristic poly- 
nomial of a n X n matrix over a field K in the best known time complexity of n^log{ri) field 
operations. 

The idea is to compute the Krylov iterates of a several vectors at the same time. More precisely, 
the algorithm computes a sequence of n x n matrices {Vi)i whose columns arc the Krylov's iterates 
of vectors of the canonical basis. Uo is the identity matrix (every vector of the canonical basis is 
present). At the i-th iteration, the algorithm computes the following 2' Krylov's iterates of the 
remaining vectors. Then a Gaussian elimination determines the linear dependencies between them 
so as to form Vi+i by picking the n linearly independent vectors. The algorithm ends when each 
Vi is invariant under the action of A. Then the matrix V~^AV is block diagonal with companion 
blocks on the diagonal. The polynomials of these blocks arc the minimal polynomials of the 
sequence of Krylov's iterates, and the characteristic polynomial is the product of the polynomials 
associated to these companion blocks. 

The linear dependencies removal is performed by a step-form elimination algorithm defined by 
Keller-Gehrig. Its formulation is rather sophisticated, and we propose to replace it by the column 
reduced form algorithm (algorithm 2.3) using the more standard LQUP factorization (described in 



Algorithm 2.3 ColReducedForm 

Require: A a, m x n matrix of rank r {m,n > r) over a field 

Ensure: A' a m x r matrix formed by r linearly independent columns of A 

1: (L, g, U, P, r) = LQUP(A^) (r = rank{A)) 

2: return ([/^0](Q^A^))^ 



[14]). More precisely, the step form elimination of Keller-Gehrig, the LQUP factorization of Ibarra 
& Al. and the echelon elimination (see e.g. [23]) are equivalent and can be used to determine the 
linear dependencies in a set of vectors. 

Our second improvement is to apply the idea of algorithm 2.1 to compute polynomials associ- 
ated to each companion block, instead of computing V~^AV. The Krylov's iterates are already 
computed, and the last call to ColReducedForm performed the elimination on it, so there only 
remains to solve the triangular systems so as to get the coefficients of each polynomial. 

Algorithm 2.4 sums up these modifications. The operations in the while loop have a 0{n'^) 
algebraic time complexity. This loop is executed at most log(n) times and the algebraic time 
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Algorithm 2.4 KGB: Keller-Gehrig Branching algorithm 

Require: A a. n x n matrix over a field 

Ensure: P^i^j.{X) the characteristic polynomial of A 

1: i = 

2: Vo = In = {Vo,l, Vo,2, • • • , Vo,n) 
3: B = A 

4: while {3k, Vk has 2' columns) do 
5: for all j do 

6: if {Vi j has strictly less than 2* columns ) then 
7: Wj = Vij 

8: else 

9: W, = [Vi,,mj] 

10: end if 

11: end for 

12: W = {Wj)j 

13: Vi+i = ColReducedForm(W) 

{Vi+ij are the remaining vectors of Wj in V^+i} 

14: B = Bx B 

15: i = i + l 

16: end while 
17: for all j do 

18: compute Pj the minimal polynomial of the sequence of vectors of Vi-ij, using algorithm 
2.1 

19: end for 

20: return n,P, 



complexity of the algorithm is therefore 0(n'^log(n)). More precisely it is 0(n'^log(A;max)) where 
fcmax is the degree of the largest invariant factor. 

2.4 Experimental comparisons 

To implement these two algorithms, we used a finite field representation over double size float- 
ing points: modular<double> (sec [6]) and the efficient routines for finite field linear algebra 
FFLAS-FFPACK presented in [6, 5]. The following experiments used a classic matrix arithmetic. We 
ran them on a series of matrices of order 300 which Frobenius normal forms had different number 
of diagonal companion blocks. Figure 2 shows the computational time on a Pentium IV 2.4Ghz 
with 512Mb of RAM. 

It appears that LU-Krylov is faster than KGB on every matrices. This is due to the extra log(n) 

factor in the time complexity of the latter. One can note that the computational time of KGB is 
decreasing with the number of blocks. This is due to the fact that the log(n) is in fact log(A:max 
where fcmax is the size of the largest block. This factor is decreasing when the number of blocks 
increases. Conversely, LU-Krylov computational time is almost constant. It slightly increases, 
due to the increasing number of rectangular matrix operations. The latter being less efficient than 
square matrix operations. 

3 Toward an optimal algorithm 

As mentioned in the introduction, the best known algebraic time complexity for the computation 

of the characteristic polynomial is not optimal in the sense that it is not 0{n'^) but C'(n"log(n)). 
However, Keller-Gchrig gives a third algorithm (let us name it KG3), having this time complexity 
but only working on generic matrices. 
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KGB vs LU-Krylov for a 300x300 matrix over GF(1 01 ) 




5 10 15 20 25 30 35 40 45 50 

Number of blocks 



Figure 2: LU-Krylov vs. KGB 

To get rid of the extra log(n) factor, it is no longer based on a Krylov approach. The algorithm 
is inspired by a 0(n^) algorithm by Danilevski (described in [13]), improved into a block algorithm. 
The genericity assumption ensures the existence of a series of similarity transformations changing 
the input matrix into a companion matrix. 



3.1 Comparing the constants 



The optimal "big-0" complexity often hides a large constant in the exact expression of the time 
complexity. This makes these algorithms impracticable since the improvement induced is only 
significant for huge matrices. However, we show in the following lemma that the constant of KG3 
has the same magnitude as the one of LUK. 

Lemma 3.1. The computation of the characteristic polynomial of a n x n generic matrix using 
KG3 algorithm requires K^n'^ + o{n^) algebraic operations, where 

2^-1 



1 



2(2'- 



1)(2- 



1)(2- 



1 



1 



(2'^-2 - l)(2'^-i - 1) 
1 



(2'^-2 - 1)(2'^ - 1) 2(2'^-2 - l)(2'^-i - 1)2_ 
and Cuj is the constant in the algebraic time complexity of the matrix multiplication. 
The proof and a description of the algorithm are given in appendix A. 

In particular, when using classical matrix arithmetic (u = 3, Ci^ = 2), we have on the one hand 
= 176/63 « 2.794. 
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On the other hand, the algorithm 2.2 called on a generic matrix simply computes the n Krylov 
vectors A^v {2n^ operations), computes the LUP factorization of these vectors (2/3n^ operations) 
and the coefhcients of the polynomial by the resolution of a triangular system {0{n^)). Therefore, 
the constant for this algorithm is 2 + 2/3 w 2.667. These two algorithms have thus a similar 
algebraic complexity, LU-Krylov being slightly faster than Keller-Gehrig's third algorithm. We 
now compare them in practice. 

3.2 Experimental comparison 

We claim that the study of precise algebraic time complexity of these algorithms is worth-full in 
practice. Indeed these estimates directly correspond to the computational time of these algorithms 
applied over finite fields. Therefore we ran these algorithms on a small prime finite field (word size 
elements with modular arithmetic). Again we used modular<double> and FFLAS-FFPACK. These 
routines can use fast matrix arithmetic, we, however, only used classical matrix multiplication so 
as to compare two 0{ti'^) algorithms having similar constants (2.67 for LUK and 2.794 for KG3). 
We used random dense matrices over the finite field ^65521, as generic matrices. We report the 
computational speed in Mfops (Millions of field operations per second) for the two algorithms on 
figure 3: 

KG3 vs LU-Krylov over Z/65521Z 
1800 I ^ ^ ^ ^ ^ ^ 1 




200 ' ' ' ' ' ' ' ' 

500 1000 1500 2000 2500 3000 3500 

Matrix order 

Figure 3: LUK vs. KG3: speed comparison 

It appears that LU-Krylov is faster than KG3 for small matrices, but for matrices of order 
larger than 1500, KG3 is faster. Indeed, the 0{n^) operations are differently performed: LU- 
Krylov computes the Krylov basis by n matrix-vector products, whereas KG3 only uses matrix 
multiplications. Now, as the order of the matrices gets larger, the BLAS routines provides better 
efficiency for matrix multiplications than for matrix vector products. Once again, algorithms 
exclusively based on matrix multiplications are preferable: from the complexity point of view, 
they make it possible to achieve 0{n'^) time complexity. In practice, they promise the best 
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efficiency thanks to the BLAS better memory management. 

4 Over the Integers 

There exist several algorithms to compute the characteristic polynomial of an integer matrix. A 
first idea is to perform the algebraic operations over the ring of integers, using exact divisions [1] 
or by avoiding divisions [3, 15, 8, 17]. We focus here on field approaches. Concerning the bit com- 
plexity of this computation, a first approaeli, using Chinese remaindering gives 0~ (ri'^'+^log|| A||) 
bit operations (0~ is the "soft-0" notation, hiding logarithmic and poly- logarithmic factors in 
n and ||^||). Baby-step Giant-step techniques apphed by Eberly [8] improves this complexity to 
0~ (n^'^log||j4||) (using classic matrix arithmetic). Lastly, the recent improvement of [17, §7.2], 
combining Coppersmith's block- Wiedemann techniques [4, 16, 26, 25] set the best known exponent 
for this computation to 2.697263 using fast matrix arithmetic. 

Our goal here is not to give an exhaustive comparison of these methods, but to show that a 
straightforward application of our finite field algorithm LU-Krylov is already very efficient and 
can outperform the best existing softwares. 

A first deterministic algorithm, using Chinese remaindering is given in section 4.1. Then wc 
improve it in section 4.2 into a probabilistic algorithm by using the early termination technique of 
[7, §3.3]. Therefore, the minimal number of homomorphic computations is achieved. Now, for the 
sparse case, a recent alternative [24], also developed in [17, §7.2], change the Chinese remaindering 
by a Hensel p-adic lifting in order to improve the binary complexity of the algorithm. In section 
4.4, we combine some of these ideas with the Sparse Integer Minimal Polynomial computation of [7] 
and our dense modular characteristic polynomial to present an efficient practical implementation. 

4.1 Dense deterministic : Chinese remaindering 

The first naive way of computing the characteristic polynomial is to use Hadamard's bound [10, 
Theorem 16.6] to show that any integer coefficient of the characteristic polynomial has the order 
of n bits: 

Lemma 4.1. Let A g Z"^", with n > 4, whose coefficients are bounded in absolute value by B > 1. 
The coefficients of the characteristic polynomial of A have less than [§ (log2(n) -|- log2(i?^) -|- 1.6669)] 
bits. 

Proof. Ci, the i-th coefficient of the characteristic polynomial, is a sum of all the {n — i) x (n — i) 

diagonal minors of A. It is therefore bounded by (") \/{n — i)B'^^" '\ The lemma is true for 
i = n since the characteristic polynomial is unitary and also true for i = by Hadamard's bound. 

Now, using Stirling's formula (n! < (1 + 6)^/2^f ), one gets (^) < ^±4^1^(1)^ (^)""'- 

Thus log2(ci) < 2. (log2(ri) -I- log2(i3^) + C) — H. Well, suppose on the first hand that « < f , 

then H (j;^ - 1 + C)| + -^^^2) + ~ 2in{2) ■ second hand, if {n - i) < |, then 

^ ^ ^1^2) ~ -'■)'^ + nfn(2) + ~ 2lnX2) ' '^^^'^^ k = n — i. Both equivalences are positive as soon 
as C <"l.666897920ir " □ 

For example, the characteristic polynomial of 



1 


1 


1 


1 


1 


1 


1—1 


-1 


-1 


-1 


1 


-1 


1 


-1 


-1 


1 


-1 


-1 


1 


-1 


1 


-1 


-1 


-1 





is - 5X^ + 40^2 - 80X + 48 and 80 = {l)Vt is greater than Hadamard's bound 56, but less 
than our bound 1004.4. 
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Note that this bound improves the one used in [11, lemma 2.1] since 1.6669 < 2 + log2(e) 
3.4427. 

Now, using fast integer arithmetic and the fast Chinese remaindering algorithm [10, Theorem 
10.25], one gets the overall complexity for the dense integer characteristic polynomial via Chinese 
remaindering of 

0{n\log{n) + log{B))). 

Well, as we see in next section, to go faster, the idea is actually to stop the remaindering earlier. 
Indeed, the actual coefficients can be much smaller than the bound of lemma 4.1. 

4.2 Dense probabilistic Monte-Carlo : early termination 

We just use the early termination of [7, §3.3]. There it is used to stop the remaindering of 
the integer minimal polynomial, here we use it to stop the remaindering of the characteristic 
polynomial: 

Lemma 4.2. [7] Let v G li be a coefficient of the characteristic polynomial, and U be a given 
upper bound on \v\. Let P be a set of primes and let {pi . . .pk,P*} be a random subset of P. Let 
I be a lower bound such that p* > I and let M = ni=iP«- Vk = v mod M, v* = v mod p* 
and vl = Vk mod p* as above. Suppose now that v^. = v* . Then v = Vk with probability at least 

^ |P| • 

The proof is that of [7, lemma 3.1]. The probabilistic algorithm is then straightforward: after 

each modular computation of a characteristic polynomial, the algorithm stops if every coefficient 
is unchanged. It is of the Monte-Carlo type: always fast with a controlled probability of success. 
The probability of success is bounded by the probability of lemma 4.2. In practice this probability 
is much higher, since the n coefficients are checked. But since they are not independent, we are 
not able to produce a tighter bound. 

4.3 Experimental results 

We implemented these two methods using LU-Krylov over finite fields as described in section 2.4. 
The choice of moduli is there linked to the constraints of the matrix multiplication of FFLAS. 
Indeed, the wrapping of numerical BLAS matrix multiplication is only valid if n{p — 1)^ < 2^^ 
(the result can be stored in the 53 bits of the double mantissa). Therefore, we chose to sample 
the primes between 2™ and 2™+^ (where m = [25.5 — ilog2(n)J). This set was always sufficient 
in practice. Even with 5000 x 5000 matrices, m = 19 and there are 38658 primes between 2^® and 
2^^*. Now if the coefficients of the matrix are between —1000 and 1000, the upper bound on the 
coefficients of the characteristic polynomial is log2™(C/) « 4458.7. Therefore, the probability of 
finding a bad prime is lower than 4458.7/38658 « 0.1153. Then performing a couple a additional 
modular computations to check the result will improve this probability. In this example, only 17 
more computations (compared to the 4459 required for the deterministic computation) are enough 
to ensure a probability of error lower than 2~^°, for which Knuth [19, §4.5.4] considers that there 
is more chances that cosmic radiations perturbed the output! 

In the following, we denote by ILUK-det the deterministic algorithm of section 4. 1 . by ILUK-prob 
the probabilistic algorithm of section 4.2 with primes chosen as above and by ILUK-QD the quasi- 
deterministic algorithm obtained by applying ILUK-prob plus a sufficient number of modular 
computations to ensure a probability of failure lower than 2^^°. 

We report in table 1 the timings of their implementations, compared to the timings of the same 
computation using Maple-v9 and Magma-2.11. We ran these tests on an athlon 2200 (1.8 Ghz) 
with 2Gb of RAM, running Linux-2.4.^ The matrices are formed by integers chosen uniformly 
between and 10: therefore, their minimal polynomial equals their characteristic polynomial. 

^We are grateful to the Medicis computing center hosted by the CNRS STIX laboratory : http://medicis. 
polytechnique.fr/medicis/. 
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TT nV—r\oi- 
±1jU]\ Cits b 




TT TTT<'-nn 


100 


163s 


0.34s 


0.22s 


0.17s 


0.2s 


200 


3355s 


4.45s 
11. 1Mb 


4.42s 

3.5Mb 


3.17s 

3.5Mb 


3.45s 

3.5Mb 


400 


74970s 


69.8s 
56Mb 


91.87s 
10.1Mb 


64.3s 
10.1Mb 


66.75s 
10.1Mb 


800 




1546s 
403Mb 


1458s 
36.3Mb 


1053s 
36.3Mb 


1062s 
36.3Mb 


1200 




8851s 
1368Mb 


7576s 
81Mb 


5454s 
81Mb 


5548s 
81Mb 


1500 




MT 


21082s 

136Mb 


15277s 

136Mb 


15436s 

136Mb 


2000 




MT 


66847s 
227Mb 


46928s 
227Mb 




2500 




MT 


169355s 
371Mb 


124505s 
371Mb 




3000 




MT 


349494s 
521Mb 


254358s 
521Mb 





Table 1: Characteristic polynomial of a dense integer matrix of order n (computation time in 
seconds and memory allocation in Mb) 

The implementation of Berkowitz algorithm used by Maple has prohibitive computational 
timings. Magma is much faster thanks to a p_adic algorithm (probabilistic ?) ^. However, no 
literature exists to our knowledge, describing this algorithm. Our deterministic algorithm has 
similar computational timings and gets faster for large matrices. For matrices of order over 800, 
magma tries to allocate more than 2Gb of RAM, and the computation crashes (denoted by MT as 
Memory Thrashing). The memory usage of our implementations is much smaller than in magma, 
and makes it possible to handle larger matrices. 

The probabilistic algorithm ILUK-prob improves the computational time of the deterministic 
one of roughly 27 %, and the cost of the extra checks done by ILUK-QD is negligible. 

However, this approach does not take advantage of the structure of the matrix nor of the 
degree of the minimal polynomial, as magma seems to do. In the following, we will describe a third 
approach to fill this gap. 

4.4 Structured or Sparse probabilistic Monte-Carlo 

By structured or sparse matrices we mean matrices for which the matrix-vector product can be 
performed with less than n? arithmetic operations or matrices having a small minimal polynomial 
degree. In those cases our idea is to compute first the integer minimal polynomial via the spe- 
cialized methods of [7, §3] (denoted by IMP), to factor it and them to simply recover the factor 
exponents by a modular computation of the characteristic polynomial. The overall complexity is 
not better than e.g. [24, 17] but the practical speeds shown on table 2 speak for themselves. The 
algorithm is as follows: 

Theorem 4.3. Algorithm 4-1 is correct. It is probabilistic of the Monte-Carlo type. Moreover, 

most cases where the result is wrong are identified. 

Proof. Let P™'" be the integer minimal polynomial of A and P™™ the result of the call to IMP. 

With a probability of ^/T^^, P™" = pmm xhen the only problem that can occur is that 
an irreducible factor of P™™ divides another factor when taken modulo p, or equivalently, that 
p divides the resultant of these polynomials. Now from [10, Algorithm 6.38] and lemma 4.1 an 
upper bound on the size of this resultant is log2 {\/n+l 2"+^B + 1). Therefore, the probability 

^http://www.msri . org/inf o/computing/(iocs/magma/text751 .htm 
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Algorithm 4.1 CIA : Characteristic polynomial over Integers Algorithm 



Require: A G Z"^", even as a blackbox, e. 

Ensure: The characteristic polynomial of A with a probability of 1 — e. 
1; 77 = 1 - a/1 - e 
2: P,4„ = IMP(A?7) via [7, §3]. 

3; Factor PA^^ over the integers, e.g. by Hensel's lifting. 

4: B = 2f (l°S2(")+log2(ll^ll')+l-6669) 

5: Choose a random prime p in a set of ilog2(\/'^ + 1 2"+^i? + 1) primes. 

6; Compute Pp the characteristic polynomial of A mod p via LUK. 

7: for all fi irreducible factor of do 

8: Compute /j = fi mod p. 

9; Find ai the multiplicity of /j within Pp. 
10: if a, == then 
11: Return "FAIL". 
12: end if 
13: end for 

14: Compute Pi^^ =Ufr=X-- ^7=0 
15: if {J2 aidegree(/i) n) then 
16: Return "FAIL". 
17: end if 

18: if {Trace{A) ^ a„_i ) then 

19: Return "FAIL". 

20: end if 

21: Return P^^^. 



of choosing a bad prime is less than 77. Thus the result will be correct with a probability greater 
than 1 — e □ 

This algorithm is also able to detect most erroneous results and return "FAIL" instead. We 
call it therefore "Quasi-Las- Vegas" . 

The first case is when p™™ = pmm ^ factor of P™™ divides another factor modulo p. 
In such a case, the exponent of this factor will appear twice in the reconstructed characteristic 
polynomial. The overall degree being greater than n, FAIL will be returned. 

Now, jf P""" ^ P""™, the tests a, > will detect it unless P""™ is a divisor of P"'", say 
pmm _ pmmQ ^j^^^^ case, ou the one hand, if Q does not divide P™™ modulo p, the total 
degree will be lower than n and FAIL will be returned. On the other hand, a wrong characteristic 
polynomial will be reconstructed, but the trace test will detect most of these cases. 

We now compare our algorithms to magma. In table 2, we denote by d the degree of the 
integer minimal polynomial and by oj the average number of nonzero elements per row within the 
sparse matrix. CIA is written in C++ and uses different external modules: the integer minimal 
polynomial is computed with LinBox^ via [7, §3], the polynomial factorization is computed with 
NTL^ via Hensel's factorization. 

We show the computational times of algorithm 4.1 (CIA), decomposed into the time for the 
integer minimal polynomial computation (IMP), the factorization of this polynomial (Fact), the 
computation of the characteristic polynomial and the computation of the multiplicities (LUK+Mul). 
They are compared to the timings of the algorithms of section 4.1 and 4.2. 

We used two sparse matrices A and B of order 300 and 600, having a minimal polynomial of 
degree respectively 75 and 424. A is the almost empty matrix FrobOSblocks and is in Frobenius 
normal form with 8 companion blocks and B is the matrix ch5-5.b3. 600x600. sms presented in 

[JV 

^ www . 1 inalg . org 
*www . shoup . net/ntl 
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Matrix 


A 


U-^AU 


A^'A 


B 




B 


n 


300 


300 


300 


600 


600 


600 


d 


75 


75 


21 


424 


424 


8 




1.9 


300 


2.95 


4 


600 


13 


ILUK-prob 


1.3 


1.5 


18.3 


31.8 


34.9 


120.0 


ILUK-det 


37.5 


121.7 


265.0 


310 


3412 


422.3 


Magma 


1.4 


16.5 


0.2 


6.2 


184.0 


6.0 


CIA 


0.32 


3.72 


0.86 


4.51 


325.1 


2.4 


IMP 


0.01 


3.38 


0.01 


1.49 


322.1 


0.04 


Fact 


0.05 


0.05 


0.01 


0.76 


0.76 


0.01 


LUK+Mul 


0.26 


0.29 


0.84 


2.26 


2.26 


2.30 



Table 2: CIA on sparse or structured matrices 

On these matrices magma is pretty efficient thanfcs to their sparsity. The early termination in 
ILUK-prob gives similar timings for since the coefficients of its characteristic polynomial are 
small. But this is not the case with B. ILUK-det performs many useless operations since the 
Hadamard bound is well overestimating the size of the coefficients. CIA also takes advantage of 
both the sparsity and the low degree of the minimal polynomial. It is actually much faster than 
magma for A and is slightly faster for B (the degree of the minimal polynomial is bigger). 

Then, we made these matrices dense with an integral similarity transformation. The lack of 
sparsity slows down both magma and CIA, whereas ILUK-prob maintains similar timings. ILUK-det 
is much slower because the bigger size of the matrix entries increases the Hadamard bound. 

Lastly, we used symmetric matrices with small minimal polynomial {A^ A and B^B). The 
bigger size of the coefficients of the characteristic polynomial makes the Chinese remainder methods 
of ILUK-prob and ILUK-det slower. CIA is still pretty efficient ( the best on B^ B ), but magma 
appears to be extremely fast on A. 

We report in table 3 on some comparisons using other sparse matrices''. 



Matrix 


n 




magma 


CIA 


ILUK-QD 


TF12 


552 


7.6 


10.03s 


6.93s 


51.84s 


TrefBOO 


500 


16.9 


108.1s 


64.58s 


335.04s 


mk9b3 


1260 


3 


77.02s 


35.74s 


348.31s 



Table 3: CIA on other sparse matrices 

To conclude, ILUK-det is always too expensive, although it has better timings than magma for 
large dense matrices (cf. table 1). ILUK-prob is well suited for every kind of matrix having a 
characteristic polynomials with small coefficients. Now with sparse or structured matrices, magma 
and CIA are more efficient; CIA being almost always faster. 

5 Conclusion 

We presented a new algorithm for the computation of the characteristic polynomial over a finite 
field, and proved its efficiency in practice. We also considered Keller-Gehrig's third algorithm and 
showed that its generalization would be not only interesting in theory but produce a practicable 
algorithm. 

We applied our algorithm for the computation of the integer characteristic polynomial in two 
ways: a combination of Chinese remaindering and early termination for dense matrix compu- 
tations, and a mixed blackbox-dense algorithm for sparse or structured matrices. These two 
algorithm outperform the existing software for this task. Moreover we showed that the recent im- 
provements of [24, 17] should be highly practicable since the successful CIA algorithm is inspired 
by their ideas, ft remains to show how much they improve the simple approach of CIA. 

^These matrices are available at http://www-lmc.imag.fr/lmc-mosaic/Jean-Guillaume.Dumas/Matrices 
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To improve the dense matrix computation over a finite field, one should consider the generaliza- 
tion of Keller-Gehrig's third algorithm. At least some heuristics could be built: using row-reduced 
form elimination to give produce generic rank profile. 

Lastly, concerning the sparse computations, the blackbox algorithms of [27] and of [9], could 
handle huge sparse matrices (no dense computation is used as in CIA). But one should study how 
their use of preconditionners, expensive in practice, penalize them. 
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A On Keller-Gehrig's third algorithm 



We first recall the principle of this algorithm, so as to determine the exact constant in its algebraic 
time complexity. This advocates for its practicability. 



A.l Principle of the algorithm 

First, let us define a m-Frobenius form as a n x n matrix of the shape: 







Ml 

_Idn~rn M2J ' 

Note that a 1-Frobenius form is a companion matrix, which characteristic polynomial is given 
by the opposites of the coefficients of its last column. 

The aim of the algorithm is to compute the 1-Frobenius form Aq of A by computing the 
sequence of matrices Ar = A, . . . ,Ao, where Ai has the 2*-Froebenius form and r = [logn] . The 
idea is to compute A^ from Aj+i by slicing the block M of ^j+i into two n x 2* columns blocks B 
and C. Then, similarity transformations with the matrix 



U = 





Id„-2i 



C2 



will "shift" the block B to the left and generate an identity block of size 2' between B and C. 



A.- 

























Id 













Id 





A = 

ij+i 










Q 
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^2 





b" 


Id 


i,i+i 


u = 


Id 



2' 



j2' 



2' 



2' 0+1)2' 2' 



n-2' 



Figure 4: Principle of Keller-Gehrig's third algorithm 

More precisely, the algorithm computes the sequence of matrices o — ^i+i, ^i.ii ■ ■ ■ i Ai,si = 
Ai, where Sj = [n/2'] — 1, by the relation Aij+i = Uj^^AijUij, whith the notations of figure 4. 

As long as Ci is invertible, the process will carry on, and make at last the block B disapear 
from the matrix. This last condition is restricting and is the reason why this algortihm is only 
valid for generic matrices. 

A. 2 Proof of lemma 3.1 

Lemma 3.1. The computation of the characteristic polynomial of a n x n generic matrix using 
the fast algorithm requires K^n"^ + o{n'^) algebraic operations, where 



= a 
+ 



1 



2(2'^-2 - l)(2'^-i - l)(2'^ - 1) 2'^-! 
1 3 2 



+ 



(20-2 _ i)(2w-i _ 1) 2^^-! - 1 2'^-2 - 1 

1 2""2 



' (20.-2 _ l)(-2a; _ 1) 2(2^^-2 - 1)(2'^-1 - 1)2 _ 

and Cuj is the constant in the algebraic time complexity of the matrix multiplication. 
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Proof. We will denote by Xa...b the submatrix composed by the rows from o to 6 of the block X. 
For a given i, KG3 performs n/2* similarity transformations. Each one of them can be described 
by the following operations: 

1- Ki-2^ + l...n = Cl..^2'-^l-2' 

^l...n-2i = -C'2> + i...„-B^_2i + i...„ + S2* + l...n 

C — B Cx^i x^2i 

^2' + 1...2HA+ = 

^2-+\+l...n+ = C'2i+A+l...u 

The first operation is a system resolution, and consists in a LUP factorization and two trian- 
gular system solve with matrix right hand side. The two following ones are matrix multiplications, 
and we do not consider the two last ones, since their cost is dominated by the previous ones. The 
cost of a similarity transformation is then: 

Tij = Tlup(2*, 2*) + 2Ttrsm(2\ 2*) 

+TMM(n - 2\ 2\ T) + Tuuin, 2\ T) 

From [6, Lemma 4.1] and [21], we have 

/ rfUJ — 2 1 



'LUP I 



(m, n) = ; m I n — m 



and 



Therefore 



iTRSM(2 ,2 j - 2(2^-1 _ 1) 



T- ■ = C^M" + f2M 

^^,3 2 (2"-2 - 1) (2'^-i - 1) (2'^-2-l)^^ 

+a(n ^ 2^)(2*)"-i + an(2^r-i 
(2—2 - 1) (2—1 - 1) " ^ 



And so the total cost of the algorithm is 



login/2) n/2'-l 

1=1 j=i 

log(n/2) 

= E (5-l)a£'a,(2T + 2na(2') 



And since 



log{n/2) 

-- a E (-Da,-3)n(2')'^-i+2n2(2')'^-2 

-i)a,(2^)" 
log{n/2) 

y (2Y = - i = -^+o(n^) 

/ J ^ ■' 2^ — 1 2^ — 1 

i=l 



16 



we get the result: 



T = 



+ 



2-c- 



1 



2(2^^-2 _ i)(2c^-i _ i)(2c^ _ 1) - 1 

1 3 2 



+ 



(2'^-2 - l)(2'^-i - 1) 2'^-! - 1 2'^-2 - 1 

1 2^^-^ 
+ 



(2'^-2 - 1)(2'^ - 1) 2(2^^-2 - l)(2'^-i - 1)2 



□ 
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